8 research outputs found

    An Algebraic Characterization of Total Input Strictly Local Functions

    Get PDF
    This paper provides an algebraic characteriza- tion of the total input strictly local functions. Simultaneous, noniterative rules of the form A→B/C D, common in phonology, are defin- able as functions in this class whenever CAD represents a finite set of strings. The algebraic characterization highlights a fundamental con- nection between input strictly local functions and the simple class of definite string languages, as well as connections to string functions stud- ied in the computer science literature, the def- inite functions and local functions. No effec- tive decision procedure for the input strictly local maps was previously available, but one arises directly from this characterization. This work also shows that, unlike the full class, a restricted subclass is closed under composition. Additionally, some products are defined which may yield new factorization methods

    Relativized Adjacency

    No full text
    For each class in the piecewise-local subregular hierarchy, a relativized (tier-based) variant is defined. Algebraic as well as automata-, language-, and model-theoretic characterizations are provided for each of these relativized classes, except in cases where this is provably impossible. These various characterizations are necessarily intertwined due to the well-studied logic-automaton connection and the relationship between finite-state automata and (syntactic) semigroups. Closure properties of each class are demonstrated by using automata-theoretic methods to provide constructive proofs for the closures that do hold and giving language-theoretic counterexamples for those that do not. The net result of all of this is that, rather than merely existing as an operationally-defined parallel set of classes, these relativized variants integrate cleanly with the other members of the piecewise-local subregular hierarchy from every perspective. Relativization may even prove useful in the characterization of star-free, as every star-free stringset is the preprojection of another (also star-free) stringset whose syntactic semigroup is not a monoid

    Extracting Subregular constraints from Regular stringsets

    No full text
    We introduce algorithms that, given a finite-state automaton (FSA), compute a minimal set of forbidden local factors that define a Strictly Local (SL) tight approximation of the stringset recognised by the FSA and the set of forbidden piecewise factors that define a Strictly Piecewise (SP) tight approximation of that stringset, as well as a set of co-SL factors that, together with the SL and SP factors, provide a set of purely conjunctive literal constraints defining a minimal superset of the stringset recognised by the automaton. Using these, we have built computational tools that have allowed us to reproduce, by nearly purely computational means, the work of Rogers and his co-workers (Rogers et al. 2012) in which, using a mix of computational and analytical techniques, they completely characterised, with respect to the Local and Piecewise Subregular hierarchies, the constraints on the distribution of stress in human languages that are documented in the StressTyp2 database. Our focus, in this paper, is on the algorithms and the method of their application. The phonology of stress patterns is a particularly good domain of application since, as we show here, they generally fall at the very lowest levels of complexity. We discuss these phonological results here, but do not consider their consequences in depth

    Typology emerges from simplicity in representations and learning

    Get PDF
    We derive well-understood and well-studied subregular classes of formal languages purely from the computational perspective of algorithmic learning problems. We parameterise the learning problem along dimensions of representation and inference strategy. Of special interest are those classes of languages whose learning algorithms are necessarily not prohibitively expensive in space and time, since learners are often exposed to adverse conditions and sparse data. Learned natural language patterns are expected to be most like the patterns in these classes, an expectation supported by previous typological and linguistic research in phonology. A second result is that the learning algorithms presented here are completely agnostic to choice of linguistic representation. In the case of the subregular classes, the results fall out from traditional model-theoretic treatments of words and strings. The same learning algorithms, however, can be applied to model-theoretic treatments of other linguistic representations such as syntactic trees or autosegmental graphs, which opens a useful direction for future research

    Robust Identification in the Limit from Incomplete Positive Data

    No full text
    Intuitively, a learning algorithm is robust if it can succeed despite adverse conditions. We examine conditions under which learning algorithms for classes of formal languages are able to succeed when the data presentations are systematically incomplete; that is, when certain kinds of examples are systematically absent. One motivation comes from linguistics, where the phonotactic pattern of a language may be understood as the intersection of formal languages, each of which formalizes a distinct linguistic generalization. We examine under what conditions these generalizations can be learned when the only data available to a learner belongs to their intersection. In particular, we provide three formal definitions of robustness in the identification in the limit from positive data paradigm, and several theorems which describe the kinds of classes of formal languages which are, and are not, robustly learnable in the relevant sense. We relate these results to classes relevant to natural language phonology

    TAYSIR Competition: Transformer+rnn: Algorithms to Yield Simple and Interpretable Representations

    No full text
    This article presents the content of the competition Transformers+rnn: Algorithms to Yield Simple and Interpretable Representations (TAYSIR, the Arabic word for 'simple'), which was an on-line challenge on extracting simpler models from already trained neural networks held in Spring 2023. These neural nets were trained on sequential categorial/symbolic data. Some of these data were artificial, some came from real world problems (such as Natural Language Processing, Bioinformatics, and Software Engineering). The trained models covered a large spectrum of architectures, from Simple Recurrent Neural Network (SRN) to Transformers, including Gated Recurrent Unit (GRU) and Long Short Term Memory (LSTM). No constraint was given on the surrogate models submitted by the participants: any model working on sequential data was accepted. Two tracks were proposed: neural networks trained on Binary Classification tasks, and on Language Modeling tasks. The evaluation of the surrogate models took into account both the simplicity of the extracted model and the quality of the approximation of the original model
    corecore